fix #639 provide NCCL tests example#640
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
Signed-off-by: Sam Stoelinga <sammiestoel@gmail.com>
d77a0eb to
6dcc0f7
Compare
|
Is this gonna be revived? |
|
@andreyvelich thanks for the pointers. But I was mainly looking at the MPIOperator from the "Infiniband / RDMA setup validation / benchmarking" POV. As in, when user creates a k8s cluster with worker nodes supporting Infiniband based network, how do they know that their set up is working correctly? That's where this PR caught my attention and I was wondering if there are plans to resuscitate this PR. |
Thanks for letting us know, I think it would be nice if you could join one of our Training WG calls to discuss it further: https://docs.google.com/document/d/1MChKfzrKAeFRtYqypFbMXL6ZIc_OgijjkvbqmwRV-64/edit#heading=h.o8oe6e5kry87 We can talk more where those benchmarks should live and how we can validate the Infiniband setup with MPI Operator. |
Draft, I need to retest it now that I've stripped down the manifest with GKE specific stuff.